Full Text Clustering and Relationship Network Analysis of Biomedical Publications
نویسندگان
چکیده
Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.
منابع مشابه
Clustering MeSH Representations Of Biomedical Literature
Biomedical literature contains vital information for the analysis and interpretation of experiments in the biological sciences. Human reasoning is the primary method for extracting, synthesizing, and interpreting the results contained in the literature, yet the rate at which publications are produced is exponential. With the advent of digital, full-text publication and increasing computational ...
متن کاملInvestigation through and Clustering the Information Needs and Information Seeking Behavior of Seminary and University Students of Khorasan-e- Razavi with Neural Network Analysis
Background and Aim: This study aims to investigate and clustering the information needs and information seeking behavior of seminary and university students using neural network analysis in Khorasan-e- Razavi. Methods: The quantitative study is an applied and descriptive survey conducted with neural networks analysis. Data were collected by a questionnaire based on the information needs and inf...
متن کاملSystematic Characterizations of Text Similarity in Full Text Biomedical Publications
BACKGROUND Computational methods have been used to find duplicate biomedical publications in MEDLINE. Full text articles are becoming increasingly available, yet the similarities among them have not been systematically studied. Here, we quantitatively investigated the full text similarity of biomedical publications in PubMed Central. METHODOLOGY/PRINCIPAL FINDINGS 72,011 full text articles fr...
متن کاملDistribution of information in biomedical abstracts and full-text publications
MOTIVATION Full-text documents potentially hold more information than their abstracts, but require more resources for processing. We investigated the added value of full text over abstracts in terms of information content and occurrences of gene symbol--gene name combinations that can resolve gene-symbol ambiguity. RESULTS We analyzed a set of 3902 biomedical full-text articles. Different key...
متن کاملA Coherent Biomedical Literature Clustering and Summarization Approach Through Ontology-Enriched Graphical Representations
In this paper, we introduce a coherent biomedical literature clustering and summarization approach that employs a graphical representation method for text using a biomedical ontology. The key of the approach is to construct document cluster models as semantic chunks capturing the core semantic relationships in the ontology-enriched scale-free graphical representation of documents. These documen...
متن کامل